WIT: Web Inventory of Transcribed and Translated Talks

نویسندگان

  • Mauro Cettolo
  • Christian Girardi
  • Marcello Federico
چکیده

We describe here a Web inventory named WIT3 that offers access to a collection of transcribed and translated talks. The core of WIT3 is the TED Talks corpus, that basically redistributes the original content published by the TED Conference website (http://www.ted.com). Since 2007, the TED Conference, based in California, has been posting all video recordings of its talks together with subtitles in English and their translations in more than 80 languages. Aside from its cultural and social relevance, this content, which is published under the Creative Commons BYNC-ND license, also represents a precious language resource for the machine translation research community, thanks to its size, variety of topics, and covered languages. This effort repurposes the original content in a way which is more convenient for machine translation researchers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Arabic-Hebrew parallel corpus of TED talks

We describe an Arabic-Hebrew parallel corpus of TED talks built upon WIT, the Web inventory that repurposes the original content of the TED website in a way which is more convenient for MT researchers. The benchmark consists of about 2,000 talks, whose subtitles in Arabic and Hebrew have been accurately aligned and rearranged in sentences, for a total of about 3.5M tokens per language. Talks ha...

متن کامل

An investigation into the frequency of Language Related Episodes in the EFL learners’ Homogeneous and Heterogeneous Dyadic Interaction

This study attempted to compare the relative frequency of the occurrence of Language Related Episodes (LREs) in the dyadic talks of pairs who were homogeneous and heterogeneous in terms of English proficiency.  LREs are those parts of the conversations where the interlocutors explicitly focus on linguistic form. The study was carried out with 60 Iranian university students of teaching English a...

متن کامل

Competition of Discourses in Journalistic Translation: Diplomatic Negotiations in Focus

We sought to understand  whether,  how, and  why the translated journalistic texts  related  to  the  Iranian  nuclear  negotiations  were manipulated. To this end,  we  monitored  a news agency’s Webpage in a time span of 46 days that began 3 days before Almaty I nuclear talks and ended  3  days  after  Almaty  II  talks.  Monitoring  resulted  in  a  corpus  made  up  of  36  target  texts  p...

متن کامل

Gauged Supergravities in Three Dimensions: A Panoramic Overview

Maximal and non-maximal supergravities in three spacetime dimensions allow for a large variety of semisimple and non-semisimple gauge groups, as well as complex gauge groups that have no analog in higher dimensions. In this contribution we review the recent progress in constructing these theories and discuss some of their possible applications. Based on talks by B. de Wit and H. Nicolai at the ...

متن کامل

First-Encounter Talks between Younger and Older Adults in Taiwan: A Conversation Analysis Approach

Outside of Western contexts, natural-conversation-based research on intergenerational communication is relatively rare. To help redress this imbalance, this paper explores the conversational styles of first-encounter talks between five pairs of college students and older adults in Taiwan, and infers the interactional norms that underlie them. It is found that younger Taiwanese adults tend to ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012